A Comparison Study of the Tehran Norms to the Reference Norms on Children Performance of the Bayley III

Objectives The Bayley Scales of Infant and Toddler Development (3rd ed.; Bayley III) are widely used to assess cognitive, language, and motor development of children aged 1–42 months. It is unclear whether or not the reference norms of the Bayley III are acceptable for use in other populations or lead to over- or underestimating the developmental status of target children. This study aimed to compare the Tehran norms to the reference norms. Materials & Methods We used Bayley III norms to assess cognitive, language, and motor development of 1,674 healthy children from health care centers in Tehran. Differences between the scaled scores were calculated based on the Tehran and reference norms. A one-sample multivariate analysis of variance (MANOVA) was used to control the mean difference scores over all subtests. When MANOVA showed significant differences between the scaled scores based on the Tehran and reference norms, we used univariate analysis to see which subtest and age group led to these significant differences. Finally, the proportions of children with low scores (scaled scores <7 or -1 SD and <4 or -2 SD) based on 2 norms were compared using the McNemar test to determine the over- or underestimation of developmental delay. Results The scaled scores based on the Tehran norms varied across values based on the reference norms in all subtests. The mean differences were significant in all 5 subtests (p < .05) with large effect sizes for receptive and expressive communication, fine and gross motor subtests of .20, .23, .14, and .25, respectively, as well as with a small effect size for the cognition subtest of .02. Large effect sizes for all age groups were found for cognition, expressive communication, and fine motor subtests. More children scored below 1 and 2 SD using the Tehran norms. Using the reference norms resulted in underestimation of developmental delay regarding cognitive, receptive and expressive communication, and fine and gross motor skills. Conclusion Population-specific norms should be used to identify children with low scores for referral and intervention. The Tehran norms differed from the reference norms for all subtests, and these differences were clinically significant.


Introduction
Globally, the potential for growth, cognitive, and socioemotional development is not being achieved in more than 200 million children under the age of 5 years (1). Early interventions are vital to prevent long-term sequels because the critical period of brain development occurs in the early years of life (2). Children in low socioeconomic status face poor health conditions, large family size, lack of home environmental stimulation, and fewer educational resources (3). The early experiences of the infant can affect its whole life since physical, social-emotional, and cognitive development in early childhood provides the basis for the child's development in future years (4,5). Previous studies have determined that the developmental paths of children with different cultural backgrounds, even within the same country, are significantly different for motor and language skills (6).

The Bayley Scales of Infant and Toddler
Development (3rd ed.; Bayley III) are the most commonly used tool to assess early developmental status, specifically cognitive, language, and motor skills (7). The Bayley III is an objective test with stepwise guidelines, norm-referenced scores, and appropriate psychometric properties to measure and assess infants' development in health care settings and for scientific research purposes; it was designed and normalized in the US. Given the factors affecting the development of infants, such as genetic features, child-rearing, social habits, ecological characteristics, socioeconomic factors, and the relationship between these factors (8), using reference norms in populations with different features and cultures appears to result in misclassification of developmental delay.
Iran J Child Neurol. Spring 2022 Vol. 16 No. 2 Overestimating development leads to the nonreferral of infants who need intervention and the loss of opportunity for early interventions, while underestimating development increases health care costs, parents' concerns, and unnecessary referrals.
Most studies conducted on child development in developing countries use tests adapted, designed, and validated for western countries (9)(10)(11). In some studies, "translating" is the only effort made (12,13), which if not carried out in conjunction with the so-called "adaptation" process, it cannot alone be indicative of that region's cultural traditions and may lead to misinterpretation of the results (14). Also, these adapted tests have limited value without knowledge of a normal variety for other populations. Adjustment studies have been conducted in many developing countries, mainly using translated and rarely adapted original tests (15)(16)(17)(18). In terms of development, this could lead to misclassification when the cut-off points of developed countries are used to assess children in developing countries (14).

Materials &Methods
This cross-sectional study was conducted in Tehran.
We used the Bayley III, in which the adaptation, psychometric properties, and the Tehran cut-off points were determined in previous studies (28,29 (7). For practical purposes, this age range is divided into 17 age groups. Scaled scores are derived from the raw scores of the Bayley III. The range of the scaled scores is 1 to 19, with an SD of 3 and a mean value of 10. Therefore, a scaled score of 10 in each subtest indicates mean functioning in that age group and 7 one SD below the mean, and scaled scores of 4 represent the 2 SDs from the mean (7). The Persian version of the Bayley III was used in this study, which had its validity confirmed in a previous study (28). in Persian-speaking children, the word "candy" was replaced with "cake" and "bird" with "fish," and the illustrations were appropriately changed, and the tool "cup" was replaced with "handled glass." Given that there is a vowel point to indicate possession in the Persian language, this form of pronouns was also added to the instructions, and the simpler and more popular form of the continuous tense, namely "to have + present tense," was used.
Changes were also made to the pronouns. Changes A one-sample multivariate analysis of variance (MANOVA) was used to control the mean difference scores over all subtests. When MANOVA showed significant differences between the scaled scores based on the Tehran and reference norms, we used univariate analysis to see which subtest led to these significant differences. Because the mean differences might be age-dependent, the MANOVA (including all subtests) was separately performed for each age group in the next step. These results were evaluated and interpreted according to Cohen

Results
In this study, 1,674 children were enrolled, of which 913 were boys (54.5%). The highest educational level of the mothers was at a moderate level (47%).  which is near to 1 SD based on the scaled scores.
The effect sizes regarding the multivariate analyses are displayed in the second column in Table 3. For all age groups, large effect sizes were found for the differences between the scaled scores based on the Tehran and reference norms, but not consistently for particular subtests or definite age groups (Table   3). For cognition, expressive communication, and fine motor subtests, effect sizes were generally large for all age groups. For the receptive communication subtest, effect sizes were generally large, with the exception of 4 age groups. For the gross motor subtest, effect sizes were generally large, with the exception of 6 age groups.
Using a scaled score of 7 (-1 SD) or 4 (-2 SD) as the cut-off point, McNemar tests showed that for all subtests, except for a scaled score of 4 for fine motor, significantly different rates of children with low scores were found using the Tehran and reference norms (Table 4). It means that fewer children scored below 1 or 2 SD in cognition, expressive, and receptive communication and fine and gross motor performance when using the reference norms instead of the Tehran norms.
In addition, McNemar tests were performed on 4 age groups ( Table 4). The proportions of children scoring below 1or 2 SD using the Tehran and reference norms varied significantly for all age groups. Therefore, using the reference norms, fewer children score below 1 and 2 SD than the Tehran norms. *'Low educational level' refers to special education, primary school, or pre-vocational secondary education (< 12 years); 'medium educational level' refers to senior general secondary education, pre-university education, or secondary vocational education (13-16 years); 'high educational level refers to higher professional education or university (17+ years).  Note. The Mean difference is calculated by the scaled score based on the Tehran norms minus the scaled score based on the US norms. Mean differences < 0 indicate that the score based on the US norms was > the scaled scores based on the Tehran norms. Mean differences >0 indicate that the scaled score based on the US sample is < the scaled score based on the Tehran sample. Effect sizes are all statistically significant, p < .01, except those not bold.